I’ve hidden all the code in this markdown by default, to make it easier to get to the map…

Continuing my explortion of Notification of Infectious Disease data, I’ve been working on an interactive map of where disease have been reported in England and Wales. I found this link a useful resource https://journocode.com/2016/01/28/your-first-choropleth-map/.

I was able to download data from the Notification of Infectious Disease website on the distrubtion of cases by local authority (“Statutory notifiable diseases - number of cases reported in week 51 of 2017”). I’ve limited this experimental analysis to one week of data. I did some refomatting in Excel and saved this as a .csv for speed and convinience, before loading it into R. I then explicitly set NA data to 0 (assuming no notifications means no cases). I also removed the string " UA" from where it appeared in local authority names.

library("reshape2")
library("magrittr")
geo_disease_data<-read.csv("geo_disease_data.csv")
geo_disease_data[is.na(geo_disease_data)] <- 0
geo_disease_data$LocalAuthority<-gsub(" UA", "", geo_disease_data$LocalAuthority)

The next task was to get some map data for local authorities. I played around with a few different maps from the Office for National Statistics Open Geography Portal (I’m new to geo-spatial data). This one seemed to be the best:

library("rgdal")

LocalAuthorities<-readOGR("https://opendata.arcgis.com/datasets/686603e943f948acaa13fb5d2b0f1275_4.geojson")

I then merged the two data sets by local authority name.

LA_Diseases<-sp::merge(LocalAuthorities, geo_disease_data, by.x="lad16nm", by.y="LocalAuthority")

The next task was to display the data as a map. For this I used leaflet. This is a wrapper for a Javascrip “App”. I’ve put some comments in the code to explain what’s going on at each step.

library("leaflet")
diseases <- names(geo_disease_data)[-1] #Get a list of disease names (the first colmn name here is local authority name)
max_cases <- max(melt(geo_disease_data, id.vars="LocalAuthority")$value) #Get the maximum number of cases for any disease
#Set up a colour scale from 0 to the maximum number of diseases
pal <- colorNumeric(
   palette = "viridis",
   domain = c(0,max_cases)
 )
#Initialise the map with the legend and a control to allow users to select a disease
myleaflet<-leaflet(width = "100%") %>% 
  addProviderTiles("Esri.WorldGrayCanvas") %>%
  addLegend( pal = pal, 
             values = 0:max_cases,
             title = "Cases",
             opacity = 1) %>%
  addLayersControl(baseGroups=diseases, 
                   position = "bottomleft", 
                   options = layersControlOptions(collapsed = TRUE))
#Add a layer to the map for each disease
for (active_disease in diseases) 
  {
  myleaflet <- myleaflet %>%
    addPolygons(data=LA_Diseases, 
              fillColor=~pal(LA_Diseases[[active_disease]]),
              fillOpacity = 0.8,
              color = "black",
              weight = 1,
              popup = paste(LA_Diseases$lad16nm, LA_Diseases[[active_disease]],"cases"), #This popup shows the local authority name and number of cases
              group = active_disease
              )
  }    
myleaflet #show the map

I have a few problems! The main issue is that I don’t have data for all of the local authorities. I thought I would try to find out why so I got a list of local authorities from the disease data and another list from the map.

Unique_LA_Map<-data.frame(LocalAuthority=unique(LocalAuthorities$lad16nm) %>% sort())
Unique_LA_Diseases<-data.frame(LocalAuthority=unique(geo_disease_data$LocalAuthority) %>% sort())

Next, I did some joins to see how many of the names were shared between the data sets and how many were unique to each.

In_Map_Not_Diseases<-data.frame(LocalAuthority=
  Unique_LA_Map$LocalAuthority[!(Unique_LA_Map$LocalAuthority %in% Unique_LA_Diseases$LocalAuthority)])
In_Diseases_Not_Map<-data.frame(LocalAuthority=
  Unique_LA_Diseases$LocalAuthority[!(Unique_LA_Diseases$LocalAuthority %in% Unique_LA_Map$LocalAuthority)])
In_Both<-base::merge(Unique_LA_Map,Unique_LA_Diseases, by="LocalAuthority")

I’ve plotted the results on a Venn diagram. Not the most beautiful, but quite informative.

library(VennDiagram)
grid.newpage()
draw.pairwise.venn(area1 = nrow(Unique_LA_Diseases), 
                   area2 = nrow(Unique_LA_Map), 
                   cross.area = nrow(In_Both), 
                   category = c("n in Diseases Data", "n in Map Data"),
                   fill = c("blue", "red")
                   )
(polygon[GRID.polygon.151], polygon[GRID.polygon.152], polygon[GRID.polygon.153], polygon[GRID.polygon.154], text[GRID.text.155], text[GRID.text.156], lines[GRID.lines.157], text[GRID.text.158], text[GRID.text.159], text[GRID.text.160]) 

276 local authorities are in both data sets, 104 are only in the map data and 12 are unique to the disease data. A large part of the explaination of this is that the map includes scottish local authorities, but the disease data is only for England and Wales. Looking at the data frames, there are some instances where the name is stated differently in each set - despite referring to the same authority e.g. “City of Kingston upon Hull” and “Kingston upon Hull, City of”. And there are some like “East Dorset” that are on the map but don’t seem to have a match at all.

My conclusion is that, whilst making maps of infectious disease occurances is a great way of getting insights out of the data, I would need to access some domain knowledge on mapping local authorities to take this much further.

---
title: "Map of Infectious Disease Notifications"
output: 
  html_notebook:
    code_folding: hide
---

*I've hidden all the code in this markdown by default, to make it easier to get to the map...*

Continuing my explortion of [Notification of Infectious Disease](https://www.gov.uk/government/collections/notifications-of-infectious-diseases-noids) data, I've been working on an interactive map of where disease have been reported in England and Wales. I found this link a useful resource https://journocode.com/2016/01/28/your-first-choropleth-map/.

I was able to download data from the [Notification of Infectious Disease](https://www.gov.uk/government/collections/notifications-of-infectious-diseases-noids) website on the distrubtion of cases by local authority ("Statutory notifiable diseases  - number of cases reported in week 51 of 2017"). I've limited this experimental analysis to one week of data. I did some refomatting in Excel and saved this as a .csv for speed and convinience, before loading it into R. I then explicitly set NA data to 0 (assuming no notifications means no cases). I also removed the string " UA" from where it appeared in local authority names.				

```{r}

library("reshape2")
library("magrittr")

geo_disease_data<-read.csv("geo_disease_data.csv")

geo_disease_data[is.na(geo_disease_data)] <- 0

geo_disease_data$LocalAuthority<-gsub(" UA", "", geo_disease_data$LocalAuthority)
```

The next task was to get some map data for local authorities. I played around with a few different maps from [the Office for National Statistics Open Geography Portal](http://geoportal.statistics.gov.uk/datasets/) (I'm new to geo-spatial data). This one seemed to be the best:

```{r, eval=FALSE}

library("rgdal")

LocalAuthorities<-readOGR("https://opendata.arcgis.com/datasets/686603e943f948acaa13fb5d2b0f1275_4.geojson")

```

I then merged the two data sets by local authority name.

```{r}

LA_Diseases<-sp::merge(LocalAuthorities, geo_disease_data, by.x="lad16nm", by.y="LocalAuthority")

```

The next task was to display the data as a map. For this I used [leaflet](http://leafletjs.com/). This is a wrapper for a Javascrip "App". I've put some comments in the code to explain what's going on at each step.

```{r}

library("leaflet")

diseases <- names(geo_disease_data)[-1] #Get a list of disease names (the first colmn name here is local authority name)

max_cases <- max(melt(geo_disease_data, id.vars="LocalAuthority")$value) #Get the maximum number of cases for any disease

#Set up a colour scale from 0 to the maximum number of diseases
pal <- colorNumeric(
   palette = "viridis",
   domain = c(0,max_cases)
 )

#Initialise the map with the legend and a control to allow users to select a disease
myleaflet<-leaflet(width = "100%") %>% 
  addProviderTiles("Esri.WorldGrayCanvas") %>%
  addLegend( pal = pal, 
             values = 0:max_cases,
             title = "Cases",
             opacity = 1) %>%
  addLayersControl(baseGroups=diseases, 
                   position = "bottomleft", 
                   options = layersControlOptions(collapsed = TRUE))

#Add a layer to the map for each disease
for (active_disease in diseases) 
  {
  myleaflet <- myleaflet %>%
    addPolygons(data=LA_Diseases, 
              fillColor=~pal(LA_Diseases[[active_disease]]),
              fillOpacity = 0.8,
              color = "black",
              weight = 1,
              popup = paste(LA_Diseases$lad16nm, LA_Diseases[[active_disease]],"cases"), #This popup shows the local authority name and number of cases
              group = active_disease
              )
  }    


myleaflet #show the map

```

I have a few problems! The main issue is that I don't have data for all of the local authorities. I thought I would try to find out why so I got a list of local authorities from the disease data and another list from the map.

```{r}
Unique_LA_Map<-data.frame(LocalAuthority=unique(LocalAuthorities$lad16nm) %>% sort())
Unique_LA_Diseases<-data.frame(LocalAuthority=unique(geo_disease_data$LocalAuthority) %>% sort())
```

Next, I did some joins to see how many of the names were shared between the data sets and how many were unique to each.

```{r}

In_Map_Not_Diseases<-data.frame(LocalAuthority=
  Unique_LA_Map$LocalAuthority[!(Unique_LA_Map$LocalAuthority %in% Unique_LA_Diseases$LocalAuthority)])

In_Diseases_Not_Map<-data.frame(LocalAuthority=
  Unique_LA_Diseases$LocalAuthority[!(Unique_LA_Diseases$LocalAuthority %in% Unique_LA_Map$LocalAuthority)])

In_Both<-base::merge(Unique_LA_Map,Unique_LA_Diseases, by="LocalAuthority")
```

I've plotted the results on a **Venn diagram**. Not the most beautiful, but quite informative.

```{r}

library(VennDiagram)
grid.newpage()
draw.pairwise.venn(area1 = nrow(Unique_LA_Diseases), 
                   area2 = nrow(Unique_LA_Map), 
                   cross.area = nrow(In_Both), 
                   category = c("n in Diseases Data", "n in Map Data"),
                   fill = c("blue", "red")
                   )

```

276 local authorities are in both data sets, 104 are only in the map data and 12 are unique to the disease data. A large part of the explaination of this is that the map includes scottish local authorities, but the disease data is only for England and Wales. Looking at the data frames, there are some instances where the name is stated differently in each set - despite referring to the same authority e.g. "`r In_Diseases_Not_Map$LocalAuthority[[3]]`" and "`r In_Map_Not_Diseases$LocalAuthority[[46]]`". And there are some like "`r In_Map_Not_Diseases$LocalAuthority[[25]]`" that are on the map but don't seem to have a match at all.

My conclusion is that, whilst making maps of infectious disease occurances is a great way of getting insights out of the data, I would need to access some domain knowledge on mapping local authorities to take this much further.